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Abstract 

Fault tolerant quantum computing methods which work with efficient 
quantum error correcting codes are discussed. Several new techniques are 
introduced to restrict accumulation of errors before or during the recovery. 
Classes of eligible quantum codes are obtained, and good candidates exhib- 
ited. This permits a new analysis of the permissible error rates and minimum 
overheads for robust quantum computing. It is found that, under the stan- 
dard noise model of ubiquitous stochastic, uncorrelated errors, a quantum 
computer need be only an order of magnitude larger than the logical machine 
contained within it in order to be reliable. For example, a scale-up by a factor 
of 22, with gate error rate of order 10 -5 , is sufficient to permit large quantum 
algorithms such as factorization of thousand-digit numbers. 

keywords Quantum error correction, quantum computing, fault tolerant 
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The future of computing lies in fault-tolerant architectures. This is true both 
of classical computing methods, and of future quantum computers. In both cases 
the reason is that it is much easier to build a device with significant imperfections, 
but with the flexibility to work around them, than it is to build an essentially 
"perfect" physical device (one whose chances of failure during any required task are 
acceptably small). There is a profound dichotomy at work here, between the power 
of information processing, and the effects of random noise and imprecision. The 
central point is that information processing itself provides powerful techniques to 
protect against information loss. 

It is a striking feature of biology that from the molecular level (e.g. transcription 
of DNA), up to the operations of organs or the whole organism, the operating 
principle involves imperfect structures with built-in self-correction, rather than close 
to perfect structures. In classical computing methods, self-correction has played a 
part, but it has not so far been such a central and all-pervasive ingredient. However, 
this appears set to change, since as circuit elements get smaller it becomes at once 
harder to make them precisely and easier to make sufficiently many that a fraction 
can be devoted to error-correction at little cost [|I[|. 

In quantum computing the need for error correction is paramount right from 
the start, since it appears that it may be impossible to find a physical system which 
could be sufficiently precise and isolated to constitute a useful 'bare' quantum com- 
puter. Here, by a 'bare' quantum device we mean one whose physical operation only 
involves elements (qubits, gates, etc.) essential to the logical structure of the task 
in hand, and by a 'useful' quantum computer we mean a general-purpose quantum 
computational device which could tackle computing tasks not readily solvable by 
other means (such as classical computing). It seemed up until only a few years ago 
that this difficulty ruled out useful quantum computers altogether, since it was un- 
known how to achieve error correction in quantum processing. We now know how 
to realise quantum error correction [§, f|, || ]6|, [?J || || [T^, [Ll|] and fault tolerant 



quantum circuits |L2], [L3|, [14], [15], [16], [L7[ [T8|, [L9|, |20], ^TJ, at a cost in the size and 
speed of the computer. Thus, useful quantum computing appears to be allowed by 
the laws of nature, and there are two questions which present themselves: 

1. What is the maximum quantum computing power achievable in a system of 
given dimension and noise rates? 

2. How is the maximum attained? 



These questions are important both from the point of view of our understanding 
of fundamental physics, and from the practical point of view of building quantum 
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computers. They are the subject of this paper. Their practical importance is large 
because quantum computers are so hard to make: controllable qubits are a precious 
resource which we wish to use as efficiently as possible. Up till now, proposed fault 
tolerant quantum computing methods have been inefficient because they are based 
on inefficient quantum coding, in which only a single qubit is stored in each block 
of n qubits []TBL ITR ITDI, EDI. This results in a physical quantum computer which 



is a hundred to a thousand times larger than logical machine embedded within it, 
if we wish to run a large quantum algorithm such as factorization of hundred- or 



thousand-digit numbers |L9, Ej) 



It is known that more efficient codes exist 1, |, | |, § § §. Knill 



discussed ways to find operations on general codes and recently Gottesman [El 
derived a complete set of fault tolerant operations which can work with efficient 
quantum codes. However, these methods require further refinement if we are to 
profit by them, otherwise the greater complexity of the operations lowers the error 
tolerance, thus offsetting the gain in coding efficiency. 

This paper discusses fault tolerant computing using efficient quantum codes, 
including specific example codes and estimates of the noise rates which can be toler- 
ated. Section [1] considers universal sets of quantum gates, and sections |2|,0 discuss a 
universal set of fault tolerant operations for Calderbank Shor Steane (CSS) quantum 
codes satisfying certain requirements. Section (| obtains classes of codes satisfying 
the requirements, and section |5] discusses fault tolerant recovery for these codes. 
The analysis of the whole method yields an estimate for the error rate which can 
be tolerated and the total computer size needed. The main conclusion is remark- 
able: to run a given large quantum algorithm, with given tolerated error rates in 
the memory and elementary operations, the physical quantum computer can be an 
order of magnitude smaller than previously thought. This represents a significant 
step forward both for the practical prospects of quantum computation, and towards 
understanding the fundamental physics underlying questions (1) and (2) enumerated 
above. 



1 Universal gates 

The following notation will be adopted. The single-qubit operators X and Z are the 
Pauli operators <r x , <r z , respectively, and Y = XZ. We use H for the single-qubit 
Hadamard operation, R = HZ for the rotation through ir/2 about the y axis of the 
Bloch sphere, and P for the rotation through tt/2 about the z axis (phase shift of 
1 1) by z).Thus R 2 = Y, P 2 = Z and (HPH) 2 = X. The general phase shift of |1) 
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by exp(i0) will be written P (</>), so P = P(tt/2), Z = P(tt), etc. 



A controlled U operation is written C U, so for example C X is controlled- not, and 
T = CC X is the Toffoli gate. 

For operations on bare qubits, the most commonly considered universal set of 
quantum gates is {U{6, cp), c X} where U(6, (p) is a general rotation of a single qubit. 
However, this is not a useful set to consider for the purpose of finding fault-tolerant 
gates on encoded qubits, because U (9, 0) is not readily amenable to fault-tolerant 
methods. 



Five slightly different proposals for fault-tolerant universal sets have been put 
forward. All involve the normalizer group, generated by {X, Z, H, P, C X} []T8| , 21 



Since Z = P 2 and X = HZH it is sufficient to have {H, P, C X} to cover this group. 
The normalizer group is not sufficient for universal quantum computation, however, 
nor even for useful quantum computation, since it can be shown that a quantum 
computer using only operations from this group can be efficiently simulated on a 
classical computer p3]. To complete the set a further operation must be added. 



1. Shor JT2| proposed adding the Toffoli gate, making the universal set {H, P, C X, T} 
(or {R, P, C X,T} which is equivalent since R = HP 2 ). Obviously, C X can be 
obtained from T, but this does not reduce the set since Shor's method to 
obtain T assumes that C X is already available. 



2. Knill, Laflamme and Zurek |T6| proposed {P, C P, C X} together with the ability 
to prepare the encoded (or 'logical') states |+) L = (|0) L + |1) L ) /y/2, \—) L = 

- 1^- This 

can be shown to be sufficient since preparation of |±) L 
together with P and X can produce H, and C P and C X suffice to produceQ 
CC Z, which with H makes CC X = T. 

3. The same paper also considers {H,P, C X,°P}. 



The same authors subsequently [0 proposed {H, P, C X} combined with prepa- 
ration of |vr/8) L = cos(7r/8) \0) l + sin(7r/8) |1) L . The latter is prepared by 
making use of the fact that it is an eigenstate of H, and once prepared is used 
to obtain a C H operation, from which the Toffoli gate can be obtained. 

Gottesman EIJ showed that C X, combined with the ability to measure X, Y 



and Z, is sufficient to produce any operation in the normalizer group. The 
universal set is completed by T, following Shor. 



1 Note the rule of thumb that a controlled rotation through 8 about some axis can be obtained 
by combining X and single-bit rotations through 9/2; a controlled-controlled rotation through 6 
can be obtained by combining C X and controlled 8/2 rotations uM. 



4 



Of the above methods, (1) is a useful starting point and will be used extensively 
in what follows, but we will need to generalize it to [[n, k, d]] codes storing more than 
one qubit per block. (2) will be considered also, though the codes for which it works 
turn out to be non-optimal. (3) will not be adopted because the implementation of 
P involves repeated recoveries against X errors before a single recovery against Z 
errors is possible. This means that Z errors accumulate for a long time before they 
can be corrected, and the resulting error tolerance is low. (4) will not be adopted 
because it is slow, requiring 12 preparations of |vr/8) L for every Toffoli gate, and the 
preparation is itself non-trivial. (5) is important because measurement of X, Y and 
Z can be performed fault-tolerantly for any stabilizer code, not just [[n, 1, d]} codes. 
Gottesman also proposed the use of measurements and whole-block operations to 
swap logical qubits between and within blocks. Thus the Gottesman methods rely 
heavily on measurement, which might be thought to be disadvantageous. How- 
ever, all the methods (1) to (5) involve measurement and/or state preparation to 
implement the Toffoli gate T. Since any useful quantum computation must make 
significant use of T (otherwise it could be efficiently simulated classically), methods 
(1), (2) and (5) are all roughly equivalent in this regard. For example, the speed 
of Shor's algorithm to factorize integers is limited by the Toffoli gates required to 
evaluate modular exponentials |29 , |i~9 . 



So far we have some universal sets of gates, but we lack a construction to show 
how to achieve the particular operations we might need in a given quantum algo- 
rithm. Typically quantum algorithms are built up from the normalizer group and 
the Toffoli gate, combined with rotations of single qubits. Preskill |l9j provides a 



construction using two Toffoli gates, measurements and a P gate to obtain P(4>) 
where cos0 = 3/5. By repeated use of this and the 7r/2 rotations it is easy to 
build any other rotation to the requisite precision. Note that this trick generalizes 
as follows: if the P gate is replaced by P{ce), then the overall result is P{4>) where 
cos0 = (6 + 10 cosa)/(10 + 6 cos a). 



2 Fault-tolerant operations for CSS codes 



In the list described in the previous section, (1) to (4) gave fault-tolerant operations 
for certain [[n, l,d]} Calderbank Shor Steane (CSS) quantum codes ||, ||, || 0j 
this section will discuss the generalization to [[n, k, d]] codes. (5) gave fault-tolerant 

~~8|; this section will give details on the 



operations for any stabilizer code|25|, |26], £7 
application to CSS codes, and section |3| will introduce further refinements. 



The CSS quantum codes are those whose stabilizer generators separate into X 
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and Z parts [BR EHf . We restrict attention to these codes, rather than any stabilizer 



code, because they permit a larger set of easy-to- implement fault tolerant operations, 
and their coding rate k/n can be close to that of the best stabilizer codes. The CSS 
codes have the property that the zeroth quantum codeword can be written as an 
equal superposition of the words of a linear classical code C , 

|o> L =EI*>> (!) 

xec 

where \x) is a product state, X IS cl binary word, and the other codewords are 
formed from cosets of Co- Let D be the k x n binary matrix of coset leaders, then 
the complete set of encoded basis states is given by 

I«>l= E \x + u-D), (2) 

xec 

where u is a k-bit binary word. We will adopt the convention throughout that 
symbols with a tilde, such as D, refer to binary matrices. This will avoid confusion 
between the Hadamard operator H and a parity check matrix H. 

We define an operation to be 'legitimate' if it maps the encoded Hilbert space 
onto itself. We define an operation to be 'fault tolerant' if it does not cause errors 
in one physical qubit to propagate to two or more qubits in any one block. Bitwise 
application of a two-bit operator is defined to mean the operator is applied once 
to each pair of corresponding bits in two blocks, and similarly for bitwise three-bit 
operations across three blocks. Legitimate bitwise operations are fault tolerant. 

The following notation will be useful. The bar as in U is used to denote the 
operation U occurring in the encoded, i.e. logical, Hilbert space, thus l{u\ U \v) l = 
(u\U\v). A block of n physical qubits stores k logical qubits. The notation Mi, where 
% is an n-bit binary word, means a tensor product of single-qubit M operators acting 
on those physical qubits identified by the Is in % (for example Xloi = X ® I ® X) . 
The notation M u , where u is a fc-bit binary word, means a tensor product of M 
operators acting on the logical qubits identified by the Is in u. 

Consider a CSS code as defined in eq. (0). Then the encoded X and Z operators 
are given by 

X u = X u .p (3) 

Zu = Zu-iDDT)- 1 !) (4) 

Equation (|3|) follows immediately from the code construction, eq. (||), and (||) follows 
from Zi = Z^fjr, which can be seen from the observation that Z{ changes the sign 
of \u) L whenever u ■ D fails the parity check i. 
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We will now examine operators which cannot be expressed as products of X and 

Z. 

Lemma 1. For [[n, l,d]] codes where all words in |0) L have weight tq mod w, 
and all words in have weight T\ mod w, bitwise application of the following are 
legitimate: P(2ti/w), c P(4it/w), cc P(8n/w), and achieve respectively P(2m/w), 
c P(4m/w), cc P(8rTT/w), where r = r\ — r§. 

Proof: for clarity we will take r$ = and ri = r, the proof is easily extended to 
general r . The argument for P(47r/w) was given in ]TB[, but we shall need it for 



cc P(8ti/w), so we repeat it here. Consider c P(4n/w) applied to a tensor product of 
two codewords. Let x, y be binary words appearing in the expressions for the two 
codewords, and let a be the overlap (number of positions sharing a 1) between x 
and y. Let \x\ denote the weight of a word x. Then 2a = \x\ + \y\ — \x + y\. There 
are three cases to consider. First if x,y G Co then \x\ = mod w, \y\ = mod w 
and + =0 mod w so 2a = mod w from which a = mod w/2. Therefore the 
multiplying factor introduced by the bitwise operation is 1. If x G C and y G G\ 
then x + y G C\ so \x\ =0 mod w, \y\ = \x + y\ = r mod w so 2a = mod w 
again. If x, y G Ci then x + y E Co so a = r mod w/2 and the multiplying factor 
is exp(ir4n / w) . The resulting operation in the logical Hilbert space is therefore 

Next consider cc P(8ii/w) applied to a tensor product of three codewords. Let 
x, y, z be words appearing in the three codeword expressions, and a, b, c be the 
overlap between x and y, y and z, and z and x, respectively. Let d be the common 
overlap of x, y and z, so 

\x + y + z\ = \x\ + \y\ + \z\ - 2a - 2b - 2c + Ad. (5) 

There are four cases to consider. If x,y,z G Co then d = mod w/4. If x, y G 
C , z G C\ then \x + y + z\ = \z\, 2a = 2b = 2c = mod w from the argument 
just given, therefore d = mod w/4. If x G C , y,z G C\ then x + ?/ + z G C , 
2a = 2c = mod w while 26 = 2r mod u> = |y| + \z\ so again d = mod w/4. If 
x,y,z G Ci thenx+y+2 G Ci, 2a = 2b = 2c = 2r mod w, therefore d = r mod w/4. 
The overall effect is that of the operation cc P(8r7r /w). □ 

Lemma 1 applied to codes with w = 8 or more provides a quicker way to gener- 
ate the Toffoli gate than previously noted, and also provides an extra single-bit gate 
P(7tt/8). The latter can be used to generate further rotations using the generalized 
two- Toffoli method referred to at the end of section |l[ The Lemma 1 concept gen- 
eralizes to ccc P(16n /w) and so on, but the codes for which this is useful (i.e. having 
w > 16) are either inefficient or too unwieldy to produce good error thresholds. 
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Gottesman [2TJ provides an elegant way to find some fault-tolerant operations. 
Let Q be the group generated by n-bit products of J, X, Y, Z where / is the identity, 
let M be a member of Q, and let S be the stabilizer. Then for operations U satisfying 
UMU^ G G VM G S, U is legitimate as long as U MU' G S. This formalism permits 
one to prove the following straightforwardly: 

Lemma 2. Bitwise C X is legitimate for all CSS codes. 

Lemma 3. Bitwise H and Z are legitimate for any [[n, 2k c — n, d}} CSS code 
obtained from a [n, k c , d] classical code which contains its dual. 

Lemma 4. Let C be a [n, k c , d] classical code which contains its dual, and for 
which the weights of the rows of the parity check matrix are all integer multiples of 
4. Then bitwise P is legitimate for the [[n, 2k c — n, d]] CSS code obtained from C. 

An alternative proof of these lemmas will emerge as we examine the effect of the 
relevant operations. 

Bitwise C X acts as follows: 

^bitwise \u) L \v) L = ^^\x + u-D^y + v-D + x + u-D^ (6) 

xeCo 2/GC0 

= E E \x + u-D)\y + {u + v)-D) (7) 

x<ECo 2/GCo 

= \u) L \u + v) L . (8) 

This is C X from each logical qubit in the first block to the corresponding one in the 
second. 

Bitwise °H acts as follows on \u) L : 

bitwise E \x + u-D)= X: (-l) u£)yT \y) . (9) 

If Cq contains its dual Co, as required for lemma 3, then D and C$ together generate 
Cq, so this can be written 

bitwise \U) L = 2 E E (-1) U£)£>TVT X + VD)= ^(-l)"^ \ v)l (10) 



where to simplify the power of (—1) we used the fact that Cq is generated by the 
parity check matrix Cq, so uD satisfies the parity check x G Cq. Equation fllOD is 
a Hadamard transform acting in the logical Hilbert space when DD T = I, and is a 
closely related transformation when DD T ^ I. 
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Using the above ideas it is easy to show that bitwise Z produces, for codes 
satisfying lemma 3, 

^bitwise \U) L \V) L = (-1)^ T \ U ) L \V) L . (11) 

We will prove lemma 4 by showing that all the quantum codewords have \x + u ■ 
D\ = \u ■ D\ mod 4, so the weights modulo 4 of the components in (^) depend on u 
but not on x. The effect of bitwise P will therefore be to multiply \u) L by the phase 
factor i\ u ' D \. 

The zeroth codeword is composed from the code Co = C 1 - generated by H, the 
parity check matrix of C. Let y and z be two rows of H, then the conditions of the 
lemma guarantee \y\ = mod 4 and \z\ = mod 4. Furthermore, since C contains 
its dual, each row of H satisfies all the checks in H, so y and z have even overlap 
2m. Therefore + = 4m mod 4 = mod 4, therefore \x\ = mod 4 for all words 
in |0} L . Next consider a coset, formed by displacing C by the vector w = u ■ D. 
Since this coset is in C it also satisfies all the checks in H, therefore its members 
have even overlap with any x G Co. Hence if \w\ — r mod 4 then \x + w\ = r mod 4 
for all the terms in the coset, which proves the lemma. 

The case DD T = I, which leads to a simple effect for bitwise H, also simplifies 
bitwise P. If DD T = I then every row of D has odd overlap with itself (i.e. odd 
weight) and even overlap with all the other rows. Using an argument along similar 
lines to the one just given, we deduce that the effect is the P operator applied to 
every logical qubit in the block, where r is the weight the relevant row of D. 



3 Measurements and the Toffoli gate 

Our set of fault tolerant operations now contains sufficient to generate the group 
Q of encoded /, X, Y, Z operations on individual logical qubits, and the normalizer 
group on whole blocks (k logical qubits) at a time, for the lemma 4 codes. It remains 
to extend the normalizer group to individual encoded qubits, and to find a fault tol- 
erant Toffoli gate. For the former, we adopt Gottesman's |^TJ proposal of switching 



logical qubits into otherwise empty blocks, applying whole-block operations, then 
switching back. For the latter, we use inter-block switching together with Shor's 
|TjJ implementation of the Toffoli gate, as simplified by Preskill [I5|. That the Shor 



technique works for lemma 4 (and lemma 3) codes follows from the following: 

Lemma 5. For CSS codes in which bitwise °Z is legitimate, bitwise co Z is 
legitimate when operating on two control blocks in the logical Hilbert space, and 
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a target block in the space spanned by n-bit 'cat' states |000 • • • 0) ± |111 • • • 1). If 
bitwise C Z has the effect \u) L \v) L — > (—1)™ \u) L \v) L , then bitwise co Z has the 
effect \u) L \v) L \a) — > (— i} a ( uvT ) \u) L \v) L \a), where a = or 1 and \a) means the 
n-bit state 1 000 ■ ■ ■ 0) or |111 ■ ■ ■ 1) accordingly. 



Proof: Consider eq. (|Tl|) and expand \u) L \v) L into a sum of 2n-bit product 
states \x) \y). The bitwise C Z operator can only have the effect (|TT|) if the overlap 
of x and y is the same, modulo 2, for every term in the sum. Therefore the bitwise 
^ operator as described in lemma 5 produces the same number of Z operations 
on the cat state, modulo 2, for every term in the corresponding expansion, and the 
effect is as described. □ 

Gottesman's switching and swapping techniques make much use of the ability to 
measure X or Z operators fault tolerantly. A method to perform such measurements 
was deduced by DiVincenzo and Shor [|14[], based on preparation of verified 'cat' 



states |000 • ■ ■ 0) + |111 ■ ■ ■ 1). However, the preparation and verification of these 
states involves many elementary gates, and the measurement must be repeated to 
ensure reliability. These operations take a considerable number of elementary gates 
and time steps, during which errors accumulate. This significantly reduces the 
tolerance on error rates in the computer. Our next ingredient is an important trick 
to circumvent this problem: 

Lemma 6. For any stabilizer code, measurement of any operator X u or Y u or 
Z u can be performed at no cost by merging it with the recovery operation. 



We will implement fault tolerant recovery using the method of Steane |T5], |20| , 
which is based on preparing a In bit ancilla in a superposition of 2 n+k product states 
which satisfy the parity checks in the stabilizer. The measurement technique which 
underlies lemma 6 is illustrated for a CSS code in figure |l|. In order to measure 
X io in this example we prepare an ancilla in |000) L + |010) L and operate bitwise 
C X, then Hadamard transform the ancilla and measure it. This permits us to learn 
simultaneously the result of measuring X io on the logical qubits, and the syndrome 
for Z errors, which can then be corrected (the whole network is repeated as necessary, 
see section [5p. Replacing C X by C Z, a measurement of Z u can be accomplished 
while learning simultaneously the syndrome for X errors. The structure of CSS 
codes permits the 2n bit ancilla to consist of two separate blocks of n bits, which is 
why fig. [1] only shows an n bit ancilla. For general stabilizer codes the method is 
essentially the same but does not have such an elegant expression in terms of logical 
qubit states. 

A standard recovery, without measurement of any observable other than the 
syndrome, involves the preparation of |0) L . To prepare |0) L + \u) L from |0) L we 
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simply add a one-bit Hadamard transform and c Xs to target bits at the non-zero 
coordinates in u ■ D. This is only slightly more complicated than preparation of 
|0) L because the number of rows in H, which gives the network to construct \0) L , is 
much larger than 1 for powerful error correcting codes. Furthermore, since |0) L + 
\u) L satisfies fewer parity checks than |0) L , the verification of the prepared state is 
quicker. Hence the claim "at no cost" in lemma 6 is justified. 

The final ingredient, before we can calculate error tolerance levels for these meth- 
ods, is to examine exactly how the switching/swapping and Toffoli gates work, in 
order to see how frequently recovery must be performed. Figures [2| to |] show exam- 
ple quantum networks. The examples all show a case in which three logical qubits 
are stored in each block, and horizontal lines indicate logical rather than physical 
qubits. The zigzags on some of the qubits are a visual aid to keep track of the 
quantum information, which propagates between blocks when operations such as 
quantum teleportation take place. The figures show measurements of X or Z taking 
place by means of cat states. The networks are drawn in this form to make it clearer 
how they operate, but it is understood that at this point in the actual implementa- 
tion the better method of figure |T| and lemma 6 would be used, so a recovery takes 
place. The exception is the co Z gate in the network for the Toffoli gate, figure |], 
which must have a cat state as target. This will be discussed shortly. 

Figure |^ shows a C X from the 2nd to the 3rd logical qubit in a data block, using 
two ancilla blocks prepared in |000) L . The first part of the network is a quantum 
teleportation from the 2nd bit of the data block to the 3rd bit of the 2nd ancilla. 
Then a block C X takes place from this ancilla onto the data block. Finally another 
teleportation replaces the bit back into the data block. The whole operation uses 
four recoveries. 

Figure || introduces a shorthand symbol for quantum teleportation, and gives 
two example implementations, depending on which ancilla states (|0) L or |+) L , etc.) 
one happens to have available. This is to make the point that teleportation can be 
carried out via any state in the Bell basis, and so the ancillas can be in one of many 
different initial states. This reduces the amount of ancilla preparation needed for 
networks such as figures |2| and f|. 



Figure [| gives the implementation of the Toffoli gate described by Preskill [19 



based on Shor's ideas. The figure does not show the complete network. The opera- 
tions in the dashed box are only carried out if the measurement indicated gives a 1. 
If one or other of the two other measurements on the data input block give a one, the 
network in the dashed box changes, but it still involves simple whole-block opera- 
tions plus two teleportations. The other feature not shown on fig. |]is the repetition 
of the measurement (via cat) used to prepare the ancillas. During its preparation, 
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the cat is verified against X errors in order that the °Z does not propagate uncor- 
rectable errors into the 3rd ancilla. However, the cat has a chance of acquiring a Z 
error, which makes the measurement fail with probability linear in the error rates. 
We therefore repeat this part of the network, using further n-bit cats prepared in 
parallel. By choosing a total of d repetitions, where d is the minimum distance of 
the error-correcting code being used, we ensure that by taking the majority vote, 
the probability of failure is lower than that of accumulating too many errors during 
recovery. Overall, the Toffoli network requires about 8 recoveries, allowing for two 
recoveries during the ancilla preparation part of the network, one when the data 
block is measured, two each for the teleportations, and a further one for the final 
switching operation (see fig. fty. 

Figure [| shows how to switch the ith logical bit between a data block and an 
ancillary block, using a single recovery. 



4 Candidate quantum codes 



In this section we will find CSS codes which meet the requirements of the fault 
tolerant methods considered in sections H and |3[ 

The general idea is that we would like Cq and its cosets given by D to have 
weight distributions which permit methods such as lemma 1, while also forming the 
code Cq, in order to satisfy lemma 3. However, the possibilities are restricted by the 
fact that a self-dual classical code over GF(2) with weights all a multiple of w > 1 



can only have w = 2 or w = 4 |30, |31| 



CSS codes [[n = 2 m — 1, 1, d}} satisfying lemma 1 with w = mod 8 can be con- 
structed from punctured binary Reed-Muller codes, the simplest example is [[15, 1, 3]] 
given in fll6| . A related possibility is [[n = 2 m — 1, k, d = 3]], m > 7 in which C 



is a punctured 1st order Reed-Muller code (whose dual has minimum distance 3) 
and Cq together with its cosets make a punctured 2nd order Reed-Muller code. The 
simplest example is [[127, 29, 3]]. The properties of these codes are far from optimal, 
so we will not pursue them further. 

Let us now concentrate on classical codes suitable for lemma 4 (and therefore for 
all of lemmas 2-6). Let Cq = C L , then by the proof of the lemma, all the weights 
of Co are multiples of 4. Such codes Co are called "doubly even", or "type II" or 
sometimes merely "even." Doubly even codes are always contained in their duals 
since the rows of the generator must all have even overlap with themselves and each 
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other. We will refer to the CSS codes having doubly even Co as "lemma 4" codes. 



Note that once we have a [[n, k, d}} lemma 4 code, a [[n — 1, k + 1, d! > d — 1]] 
lemma 4 code can be obtained by deleting a row from the generator of Co (which is 
the check matrix of C) H]. 

Any extended quadratic residue classical code of length n = 8m is doubly even 
PH. This yields a set of good lemma 4 codes beginning [[24,0,8]], [[48,0,12]], 
[[80, 0, 16]], [[104, 0, 20]] . . . , in this list only the code of the smallest n for given d is 
mentioned. No better codes exist for d < 20, and none better is known to exist for 
d — 20 [p2| . From these we obtain lemma 4 codes such as [[47, 1, 11]], [[79, 1, 15]], 
[[99, 5, 15]] which are considered in section [5] In section |5| (eq. ([0|)) we will need to 
know enough about the weight distribution of Co to estimate the average weight of a 
row of the generator matrix of Cq. This average weight is equal to the average weight 
of the generator of the self-dual code we started from, which is well approximated 
by its minimum distance in this case (i.e. respectively 8,12,16,20 for the codes in 
the first list in this paragraph). 



Bose Chaudhuri Hocquenghem (BCH) classical codes [[33], |34|, ^TJ yield a good 



set of CSS quantum codes. The condition for a BCH code to contain its dual 



was discussed in BR ESI . It can be shown that the dual of double- and triple- 



error correcting BCH codes of length 2 m — 1 is doubly even when m > 3 [31]] , and I 



conjecture that the dual of a BCH code of length 2 m — 1 is doubly even whenever the 
code contains its dual. I have checked that this conjecture is satisfied for n < 127 by 
examining the parity check matrices. Hence we have a large class of lemma 4 codes, 
containing for example [[31,11,5]], [[31,1,7]], [[63,39,5]], [[63,27,7]], [[127,85,7]], 
[[127,43,13]], [[127,29,15]], and [[255,143,15]] by conjecture. Of these, the codes 
[[127, 29, 15]] and [[127, 43, 13]] yield the best results in section §. For length 2 m - 1 
BCH codes, the weights of the rows of the parity check matrix are 2 m_1 . This value 
is required in section ^] (equation (]T5|)). 



5 Error tolerance and overheads 



We now wish to estimate the amount of noise which can be tolerated by a quantum 
computer using the methods discussed. The estimate is made through an analysis 



based on that in 20 , but with some new features. 



The quantum computer will operate as follows. A computation involving K 
logical qubits will be carried out using Kj k blocks to store the quantum information, 
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plus a further 3 blocks which act as an "accumulator". Each accumulator block 
uses the same error correcting code as the rest of the computer, but only stores one 
logical qubit at a time. Quantum gates on the logical information in the computer 
are carried out via the accumulator using the methods illustrated in figs |2| to ^[ A 
larger accumulator of ~ K/4k blocks could be used without greatly changing the 
overall results. 

In order to extract syndromes, we prepare, in parallel, 4 ancillary blocks for every 
data block or accumulator block in the computer. A single complete syndrome is 
extracted for the 3 + Kj k blocks, using 2 of the prepared ancillas for each block. 
Whenever this first syndrome indicates no errors, we accept it even though it has 
a small chance of being the wrong syndrome [[3(J . Those blocks are left alone, any 



errors they contain will with high probability be corrected by the next recovery. 
There remain a number of blocks whose first syndrome was non-zero. Each of these 
blocks will be corrected, but only after the syndrome has been extracted a further 
0(d) times, and the best estimate syndrome (e.g. by majority vote) is used to 
correct the block. The ancillas required for these further extractions have already 
been prepared: they are the remaining 2(3 + K/k) ancillas which were not used for 
the first syndrome. 

Each ancilla uses n qubits to store the prepared state, plus one used for verifica- 
tion, therefore the total number of physical qubits in the computer is (5n + 4) (3 + 
K/k). The ratio 

5= 5n + 4 
k 

is the scale-up in computer size necessary to allow fault tolerance by this method. 

The method described in the previous paragraphs makes better use of ancillas 
than the simplest approach of generating 0(d) syndromes for every block. In order 
that the prepared ancillas are sufficient in number, we require that the probability 
of obtaining a non-zero syndrome is less than 0(l/d). We will confirm that this is 
the case at the end of the calculation. 

We need to estimate the number of elementary gates and time steps required to 
prepare an ancillary n-bit block in the state |0) L , and to verify the state so that the 
only X errors which remain in it are uncorrelated with each other. Let H be the 
generator of C , and let w be the average weight of a row of H. We would like w to 
be small so that ancillas can be prepared quickly but the construction of [[n, k, d}] 
CSS codes rules this out for k ^> 1. Since all the cosets used to build the quantum 
codewords (see eq. H) are at distance at least d from each other, C must consist of 
words separated by significantly more than d, so w may be several times larger than 
d. 
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The network to build |0) L consists of one Hadamard gate and w — 1 controlled- 
nots for each row of H. In |20) it was assumed that gates on different blocks could 
be performed in parallel, while 2- or more-bit gates within a block could not. Here 
we shall allow a further type of parallelism, namely that a multiple controlled- 
not, involving one control bit and several target bits in the same block, can be 
performed in a single time step. This is physically reasonable since it is possible 
in implementations such as the ion trap, in which a communal degree of freedom 
is coupled to every qubit in a block, and we can drive transitions in many qubits 
simultaneously. The number of time steps required to build |0) L is therefore equal 
to the number of rows in H, which is (n — k)/2. 

The most time-consuming part of the ancilla preparation is the verification. This 
involves the evaluation of parity checks using controlled nots from several control 
bits to a single target bit, which cannot be done in parallel. We will assume a 
thorough verification, evaluating all the parity checks in H and D. The former 
confirms that the prepared state is in the encoded Hilbert space, the latter that |0) L 
rather than some other encoded state has been prepared. To keep the number of 
time steps to a minimum, we arrange for D to contain as few Is as possible. Since 
we have a distance d CSS code, the weight of each row of D is at least d. We will 
assume D can be arranged to have a mean weight per row of d+ 1. Hence the total 
number of °X gates used for verification of one ancilla is w(n — k)/2 + (d + l)k. 

One complete recovery of the computer fails if any block develops more than 
t = {d— l)/2 errors, or any correction is applied on the basis of a wrong syndrome. 
It is shown in |2(J that the failure probability is dominated by the former when the 
syndrome is extracted 0(t+ 1) or more times. First consider the blocks whose first 
extracted syndrome was non-zero. For one of these blocks the probability to develop 



more than t errors is 20 



P ~ 2 V ~rr~~ — rr -7 + —el (13) 

where 7 is the probability of gate failure, e the probability of memory error per 
time step, and g,s are the number of independent opportunities for gate, memory 
errors respectively. The errors either occur directly in the block to be corrected, or 
they originate in an ancilla and are subsequently propagated into the block. Using 
the assumptions made above concerning the weights of H and D, and the degree of 



parallelism, an analysis similar to that in [20] yields 



g ~ n(Ar + 1) (14) 
s ~ n \{w + 2) - j + (d + 2)k + n(2 + r/2) ) , (15) 
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where r is the number of repetitions of the syndrome extraction required on average 
for confidence that one has the right syndrome. We will take r = t + 1, which is a 
safe over-estimate, as discussed in [^0] . 



The only possibility left out of equation ( Jl3| ) is that a zero syndrome is wrongly 
obtained for some block during the first round of syndrome extraction (thus the 
block remains uncorrected), and then the further errors developed during the next 
recovery bring the total up to more than t before correction is applied. Consider the 
two successive recoveries of such a block, in which no corrective measure is applied 
after the first recovery. There are approximately twice as many error opportunities, 
so the probability that t + 1 errors develop can be estimated as 2 t+1 P. However, to 
wrongly obtain a zero syndrome requires an error in the syndrome which matches 
the syndrome, which is highly unlikely. Its probability is small compared to l/2* +1 , 
therefore this failure mechanism is much less likely than the one leading to eq. ( pT3|) . 

To run a quantum algorithm needing K qubits and Q Toffoli gates, we require 

P< 8KQ (16) 

since we require 8Q recoveries of each block (see section |3|) and there are 3 + Kj k ~ 
K/k blocks, and all these recoveries must succeed so that the overall success prob- 
ability is greater than a half. 



Table 1 shows the scale-up and error rates needed to satisfy fll3|) to (jig) for 
various quantum error correcting codes, for KQ = 2.15 x 10 12 . This size of compu- 
tation is sufficient to factorize a 130 digit (430 bit) number using Shor's algorithm 
[f2g 0. Comparing the [[127,29,15]] and [[127,43,13]] codes with the [[47,1,11]] 
and [[79, 1, 15]] codes we see that the more efficient codes allow one to save about 
a factor 10 in the scale-up S, with no change in noise level, or a factor 17 in S if 
the memory noise e is reduced by a factor 3. If we wish to factorize thousand-digit 
numbers, then KQ grows by a factor 9 4 = 3 s . However, P scales as 7* +1 so (|16|) 
would still be satisfied by the distance 15 codes if 7 and e were reduced by a factor 
3. 



The assumption that eq. (|T2"| ) allows for sufficiently many ancillas needs to be 
verified. The probability Pi of obtaining a non-zero syndrome can be estimated 
using eq. ([T^) but letting the sum run from 1 to g instead of t + 1 to g. This 
gives Pi ~ 32nr7/3 when e = 7/n. Examining table 1, we find Pi is largest for the 
distance 15 codes having n = 127 and 255, which give respectively P = 0.25,0.31 
using 7 = 2 x 10~ 5 , 1.1 x 10~ 5 respectively. The former case is satisfactory, since if 
there are B blocks in the computer, then after the first syndrome extraction we have 
2B available ancillas and we only need 7 x 0.25-B to complete the 8-fold repetition of 
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extraction of the non-zero syndromes. The latter case (Pi = 0.31) is not satisfactory, 
but becomes so if we reduce 7 to 1 x 10~ 5 . This confirms eq. (|12j). 

To conclude, fault tolerant quantum computing can work well with efficient quan- 
tum error correcting codes such as the [[127, 29, 15]] CSS code obtained from the 
classical [127, 78, 15] BCH code. The success relies on the ability to merge useful 
measurements on the logical state with recovery operations, on careful network de- 
sign and on optimized use of ancillas. These insights allow large quantum computa- 
tions, such as factorization of 100 or 1000-digit numbers, to proceed on a quantum 
computer about an order of magnitude smaller than previously thought, without 
change in the necessary noise level. The fault tolerant quantum computer need only 
be about one order of magnitude larger than the logical computer contained within 
it. 

The author is supported by the Royal Society and by St. Edmund Hall, Oxford. 
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Table 1. Error rates and scale up required to run a quantum algorithm of size 
KQ = 2.15x 10 12 . The first column gives the parameters of the quantum code, which 
is identified in section [|. The 2nd column gives the required success probability for 
recovery of a single block (eq. fllB])). The 3rd and 4th columns give the required 
error rates (eqs. (|T^)- ([T5|) ), and the final column gives the scale-up in computer size 
(eq. (0)). 
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Figure 1: Method to measure X io while simultaneously extracting the syndrome 
for Z errors. The left hand network is that proposed in jTJj, it is replaced by 



the right hand network. The horizontal lines represent logical qubits, the boxes 
represent the measurement of the ancilla, from which a syndrome and the required 
X io measurement can be deduced. The final arrow represents correction of the 
deduced Z errors. The necessary repetition discussed in the text is not shown. 



Figure 2: Network to perform X between two logical qubits in the same block, in 
this case the 2nd and 3rd qubits of the lower block. The horizontal lines indicate 
logical qubits, in this example there are three logical qubits per block. A small 
empty box represents a measurement. An operator in a dashed box indicated by 
an arrow is only carried out if the relevant measurement yields a 1. The zigzag 
lines are a visual aid to help keep track of the quantum information which moves 
between blocks when quantum teleportations take place. The horizontal lines are 
shown narrow when the relevant qubit is in the state |0) L . 



Figure 3: Illustration of two ways to implement teleportation. The symbol on the 
left is a shorthand which is used in fig. 4. The two crossed boxes indicate the qubits 
which are placed in a Bell state, and the arrow indicates whence and whither the 
qubit is teleported. 



Figure 4: Fault-tolerant Toffoli gate on three qubits in the same block. If the data 
block at the bottom is initially in the state \^) L , then the third ancilliary block ends 
up in the state T\^f) L . The operations in the dashed box are only carried out if 
the indicated measurement yields a 1. The figure does not show further operations 
which are required if the other measurements yield Is, nor the repetition of the initial 
"measurement via cat" which prepares the ancillas (see text). The encircled zeros 
and the encircled plus sign are a visual reminder that those qubits are in the states 
|0) L and \+) L , respectively, which allows the teleportation and switching operations 
to function. 



Figure 5: Switching the i'th bit out of a block (a), and back in again (b). 
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Figure 5: Switching the z'th bit out of a block (a), and back in again (b). 
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